Conversation
| raft::device_vector_view<float, int64_t> mu, | ||
| raft::device_scalar_view<float, int64_t> noise_vars, | ||
| bool flip_signs_based_on_U = false); | ||
|
|
There was a problem hiding this comment.
Making a note here that I don't think the existing cuml implementation has the ability to tune the percentage of explained variance.
For example, in sklearn we can set 0 < n_components < 1 where the user can select a percentage of the explained variance to recover and the n_components is automatically determined by the algorithm in order to satisfy that.
We will have to build that piece out since it doesn't exist in the current implementation.
There was a problem hiding this comment.
Tuning is not what's being asked for. Exposing the explained variance is what's being requested (it is used for tuning / selecting the number of components but that's something the user does, not something we need to do).
| prms.copy = config.copy; | ||
| prms.whiten = config.whiten; | ||
| return prms; |
There was a problem hiding this comment.
| prms.copy = config.copy; | |
| prms.whiten = config.whiten; | |
| return prms; | |
| prms.copy = config.copy; | |
| prms.whiten = config.whiten; | |
| prms.verbose = config.verbose; | |
| return prms; |
We are missing verbose here, no?
There was a problem hiding this comment.
verbose was a unused parameter, removed it from pca.hpp in 99f32fc
cpp/tests/preprocessing/pca.cu
Outdated
| // prms.n_cols = params.n_col; | ||
| // prms.n_rows = params.n_row; |
| * @param[in] flip_signs_based_on_U whether to determine signs by U (true) or V.T (false) | ||
| */ | ||
| void fit(raft::resources const& handle, | ||
| params config, |
There was a problem hiding this comment.
Small thing but important for API consistency: params config is passed by value here and in all other functions in the PR. The cuVS convention afaik is const params& for example in kmeans::fit:
void fit(raft::resources const& handle,
const cuvs::cluster::kmeans::params& params,
raft::device_matrix_view<const float, int> X,
std::optional<raft::device_vector_view<const float, int>> sample_weight,
raft::device_matrix_view<float, int> centroids,
raft::host_scalar_view<float> inertia,
raft::host_scalar_view<int> n_iter);This won't affect performance or correctness at all, but for consistency I would suggest changing to const params& config throughout. Applies to all 8 overloads in this header plus the detail layer.
cpp/tests/preprocessing/pca.cu
Outdated
| } | ||
|
|
||
| protected: | ||
| void basicTest() |
There was a problem hiding this comment.
Both basicTest and advancedTest set n_components = n_col, so the inverse transform should perfectly reconstruct the input.
Consider adding a test with n_components < n_col that verifies the reconstruction error is bounded but non-zero, this would confirm the dimensionality reduction is actually working, not just passing data through unchanged and be a useful test case.
divyegala
left a comment
There was a problem hiding this comment.
Why were double instantiations needed? Where is the code intended to be used?
Co-authored-by: Dante Gama Dessavre <dante.gamadessavre@gmail.com>
Co-authored-by: Dante Gama Dessavre <dante.gamadessavre@gmail.com>
The cuml pca python interface supports double inputs. However, cuml will use the raft api, so therefore cuvs does not need double instantiations if we think its not valuable. |
In that case, please remove |
52894e3 to
465e546
Compare
Resolves #1207. Depends on rapidsai/raft#2952
This PR introduces the
cuvs::preprocessing::pcawithfloatsupport. The following APIs are supported:fit,transform,fit_transform,inverse_transform.